Audio Support for Scope by BuffMcBigHuge · Pull Request #480 · daydreamlive/scope

BuffMcBigHuge · 2026-02-17T23:57:32Z

Audio Support for Scope

Summary

Adds end-to-end audio support to Scope's WebRTC streaming pipeline. Pipelines can now return audio alongside video in their output dict; the server buffers, resamples, and streams audio over WebRTC and NDI. A shared media clock keeps audio and video synchronized.

What's New

Backend

Pipeline interface: Pipelines may return {"video": ..., "audio": ..., "audio_sample_rate": ...}. Audio keys are optional; pipelines that don't produce audio are unchanged.
PipelineProcessor: New audio_output_queue for audio chunks from the pipeline output dict.
FrameProcessor: Audio drain thread reads from the last processor's audio queue, resamples to 48 kHz (WebRTC standard), mixes to mono, and buffers 20 ms chunks. get_audio() drains the buffer for the audio track.
MediaClock: Shared clock for A/V sync. Both video and audio tracks derive PTS from get_media_time() so RTCP Sender Reports map correctly to NTP.
AudioProcessingTrack: aiortc MediaStreamTrack that produces 20 ms frames at 48 kHz. Returns silence when no audio is available to keep the track alive.
NDI: NDIOutputSink.send_audio() sends float32 audio via NDIlib_send_send_audio_v2.

Frontend

VideoOutput: Mute/unmute toggle (speaker icon). Starts muted to satisfy browser autoplay policy; user can unmute once the stream is playing.
useUnifiedWebRTC: Merges video and audio tracks into a single MediaStream. Adds a recvonly audio transceiver so the SDP offer includes an audio m-line for the backend to attach its track.

WebRTC Handshake

The browser adds addTransceiver("audio", { direction: "recvonly" }) so the offer includes an audio m-line. After setRemoteDescription, the backend finds the audio transceiver, attaches its AudioProcessingTrack, and sets direction to sendonly. The answer then indicates that the server will send audio.

Architecture

Pipeline.__call__() → {"video": tensor, "audio": tensor, "audio_sample_rate": int}
    │
    ▼
PipelineProcessor.audio_output_queue
    │
    ▼
FrameProcessor._audio_drain_loop (resample to 48 kHz, buffer)
    │
    ├──► AudioProcessingTrack.recv() → WebRTC
    └──► NDIOutputSink.send_audio() → NDI

The biggest thing is that I'm not sure the synchronization work fine. @BuffMcBigHuge I tested this on your LTX-2 and I found the video completely not synchronized with audio, so the experience is not good. The video pauses then the audio is still playing, etc.. Look at the recording I share. This is something we need to address.

demo_audio.mp4

leszko · 2026-02-19T09:43:54Z

src/scope/core/outputs/ndi.py

            logger.error(f"Error sending NDI frame: {e}")
            return False

+    def send_audio(


Nice that you added the audio for NDI as well 🏅

leszko · 2026-02-19T09:45:07Z

src/scope/server/frame_processor.py

+            if not self.pipeline_processors:
+                time.sleep(0.01)
+                continue


Can we maybe start the audio loop AFTER the pipeline processors are created, then we wouldn't need to have this check here.

I had struggled with this actually - I'll look into your suggestion.

leszko · 2026-02-19T09:45:56Z

src/scope/server/frame_processor.py

+                if isinstance(audio_tensor, torch.Tensor):
+                    audio_np = audio_tensor.float().numpy()
+                else:
+                    audio_np = np.asarray(audio_tensor, dtype=np.float32)


Do we accept the pipelines to create audio in two formats? Or if not, then why would need this check?

leszko · 2026-02-19T09:47:04Z

src/scope/server/frame_processor.py

+                # Ensure shape is [C, S] (channels, samples)
+                if audio_np.ndim == 1:
+                    audio_np = audio_np[np.newaxis, :]  # mono -> [1, S]
+
+                # Mix down to mono for WebRTC (average channels)
+                if audio_np.shape[0] > 1:
+                    audio_mono = audio_np.mean(axis=0)
+                else:
+                    audio_mono = audio_np[0]


Similar question, we have a lot of checks for the audio tensor shape, do we allow returning different formats?

leszko · 2026-02-19T09:50:25Z

src/scope/server/frame_processor.py


        return frame

+    def _audio_drain_loop(self):


Nit: I wonder if we could/should move the audio related code into a separate file, like audio.py. The main reason is that frame_processor.py becomes busy.

leszko · 2026-02-19T10:33:08Z

src/scope/server/pipeline_processor.py

+            if audio_output is not None and audio_sample_rate is not None:
+                # Detach and move to CPU for downstream consumption
+                audio_output = audio_output.detach().cpu()
+                logger.info(


I think it should be at the debug log, because the logs get too noisy.

leszko · 2026-02-19T10:33:44Z

src/scope/server/frame_processor.py

+                    self._audio_buffer.append(audio_mono)
+                    self._audio_buffer_samples += len(audio_mono)
+                    logger.info(
+                        f"[FRAME-PROCESSOR] Audio buffered: {len(audio_mono)} samples "


I think this should be a the the debug level, because the logs get too noisy.

j0sh

One note on timing / synchronization. Do the audio and video pipelines run independently?

As far as I can tell, media is timestamped right before sending out to WebRTC, in which case there will likely be desync if the pipeline for one track is delayed compared to another. You might want to propagate a reference timestamp at the beginning of both pipelines earlier in the process, but I don't really know this codebase well enough to suggest where.

Re: sync, there might also be more subtle WebRTC usage issues but I am not sure yet; things look mostly fine from what I can tell. WebRTC playback sync is complex and some of the knobs in frameworks like Pion are a little unintuitive; I'm not too familiar with aiortc at the moment.

j0sh · 2026-02-20T20:34:52Z

src/scope/server/tracks.py

+        media_time = self.media_clock.get_media_time()
+        frame.pts = self.media_clock.media_time_to_audio_pts(media_time)


I think the corresponding call for video is missing?

nit: since media_time_to_audio_pts is called immediately is after get_media_time, I think this API can be simplified to something like

frame.pts = self.media_clock.to_pts(AUDIO_CLOCK_RATE)

Since AUDIO_CLOCK_RATE is being manually set elsewhere anyway, I am not sure if there really needs to be separate API entry points for audio / video

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

BuffMcBigHuge added 8 commits February 17, 2026 17:39

Audio with NDI, audio buffer in frame loop, added audio track and med…

b96e00e

…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Frontend audio work.

ad8f5d0

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Import fixes.

c1b098a

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Modified order of operations for audio track.

ef3aff2

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Modification to audio handshake.

32c8833

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Solving issues with audio handshake.

acd30d5

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Audio support testing and logging.

9f9662f

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Fighting with audio connection handshake issue.

fb35011

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

BuffMcBigHuge mentioned this pull request Feb 18, 2026

Audio support, controls, performance. daydreamlive/scope-ltx-2#3

Open

leszko requested changes Feb 19, 2026

View reviewed changes

j0sh reviewed Feb 20, 2026

View reviewed changes

BuffMcBigHuge added 2 commits February 23, 2026 18:16

Merge branch 'main' into marco/feat/audio.

65d65f7

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Mediaclock rework.

073a6a2

Signed-off-by: BuffMcBigHuge <marco@bymar.co>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Audio Support for Scope#480

Audio Support for Scope#480
BuffMcBigHuge wants to merge 10 commits intomainfrom
marco/feat/audio

BuffMcBigHuge commented Feb 17, 2026 •

edited

Loading

Uh oh!

leszko left a comment

Uh oh!

leszko Feb 19, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

BuffMcBigHuge Feb 23, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

leszko Feb 19, 2026

Uh oh!

j0sh left a comment

Uh oh!

j0sh Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		media_time = self.media_clock.get_media_time()
		frame.pts = self.media_clock.media_time_to_audio_pts(media_time)

Comments

Conversation

BuffMcBigHuge commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Audio Support for Scope

Summary

What's New

Backend

Frontend

WebRTC Handshake

Architecture

Related

Uh oh!

leszko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j0sh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BuffMcBigHuge commented Feb 17, 2026 •

edited

Loading